Improved cognitive performance in trace amine-associated receptor 5 (TAAR5) knock-out mice

Trace amine-associated receptors (TAARs) are a family of G protein-coupled receptors present in mammals in the brain and several peripheral organs. Apart from its olfactory role, TAAR5 is expressed in the major limbic brain areas and regulates brain serotonin functions and emotional behaviours. However, most of its functions remain undiscovered. Given the role of serotonin and limbic regions in some aspects of cognition, we used a temporal decision-making task to unveil a possible role of TAAR5 in cognitive processes. We found that TAAR5 knock-out mice showed a generally better performance due to a reduced number of errors and displayed a greater rate of improvement at the task than WT littermates. However, task-related parameters, such as time accuracy and uncertainty have not changed significantly. Overall, we show that TAAR5 modulates specific domains of cognition, highlighting a new role in brain physiology.


Results
The switch task: a temporal decision-making task in the home cage. To investigate the impact of TAAR5 on high cognitive functions, we tested mice on a temporal discrimination task, called the switch task 30 (Stupplementary Fig. S1a). This task requires a fine judgment of two different temporal signals that last in the range of seconds 31 . The training lasted two weeks after an initial week of pre-training in which the animals familiarised themselves with the operant wall in the home cage. The operant wall consisted of three holes/hoppers (central, left and right) equipped with an infrared beam to detect the nose-poking (NP) activity. Each hole was additionally supplied with a light bulb on the top to signal the start, the duration and the end of every trial 32 . During the first week of training, the mice had to learn to discriminate between two intervals: one of 3 s (short signal) and the other of 9 s (long signal). Each time interval was associated with one of the two lateral hoppers, and its duration was signalled by the light above the hopper. Short and long signals were randomly intermixed. During the second week, probe trials were introduced with a probability of 20% on each side. Probe trials consisted of a light signal of the same duration, but no reward was delivered in case of a correct response. Probe trials were randomly intermixed with regular trials aiming to assess the accuracy in time perception and the perseverance in nose-poking activity around the learned target time.
The main advantage of this task is that it allows monitoring the animals' behaviour in their home cage continuously for several days (24 h a day, for several weeks), reducing animal stress and highlighting subtle differences even between mouse substrains 32 . This advantage allows to obtain statistically powerful results with a reduced number of animals. We first checked whether the TAAR5-KO mice showed alteration or deviation of their circadian parameters compared to WT mice. We couldn't observe any differences between the groups in terms of the circadian period (Supplemental Fig. S1b-e) or activity distribution along the 24 h cycle (Supplemental Fig. S1dg) in both weeks of training. A significant effect of time was present in the first week of training (Supplemental Fig. S1c), likely due to a learning dynamic that required higher activity at the beginning of the training to obtain enough food. This activity decreased over the training with a performance improvement, and it increased again in the second week, where it remained constant (Supplemental Fig. S1f). The increase during the second week was due to the addition of probe unrewarded trials, which required higher engagement in the task to obtain the same amount of food. The continuous monitoring of mice performance allowed to track the evolution over time and the learning of fine cognitive functions. With this task, we expected to identify when and how a change in the cognitive performance of TAAR5-KO mice emerged during training. TAAR5-KO mice show better performance over training. In a previous study, it was found that TAAR5-KO mice showed less anxiety and antidepressant-like phenotype 15 . Here, we asked whether this alteration in emotional behaviour affected fine cognitive functions and potentially contributed to a better performance in temporal decision-making tasks. Indeed, we observed that KO mice showed on average to perform better for the duration of the entire training (Fig. 1). This improved performance was evident since the first day of training, reaching a steady performance by day 2 for KO mice (Fig. 1a). In contrast, WT mice showed a significantly lower performance ( Fig. 1a-f). We further explored whether this effect was specific to the light or the dark phase. We found that in both phases and for the entire duration of the training, KO mice showed a better performance compared to WT (Fig. 1b,g). A more temporally refined analysis revealed that this difference was maintained hourly over the circadian cycle (Fig. 1c,h). We already excluded that the better performance was related to altered activity patterns (hyper-or hypo-activity), as shown in Supplementary Fig. S1. Furthermore, this difference could not be related to different food intake and feeding behaviour as both groups received a comparable number of pellets and performed a comparable number of trials on a daily and hourly basis (Fig. 1d, e, i, l; Supplementary Fig. S2).
The better performance of TAAR5-KO mice and the comparable number of trials between the two groups suggests that the difference should be in the number of rewarded trials (performance includes probe trials too. Probe trials are correct trials but not rewarded). However, the absolute number of rewarded trials was not different along the circadian cycle between the two groups ( Supplementary Fig. S2a-d), suggesting that both groups received the same amount of food. Instead, the difference resided in the reward rate over days (Fig. 1a, Supplementary Fig. S2b-e) and was partially observable along the circadian cycle too (Fig. 1c, Supplementary  Fig. S2c-f), suggesting that TAAR5-KO mice were proportionally more efficient at the task despite the number of overall trials and reward received was similar between the two groups. Furthermore, we found proportionally more probe trials for TAAR5-KO mice compared to their littermate control ( Supplementary Fig. S2g,h,i), confirming that the better performance of TAAR5-KO mice was due to a combination of successfully rewarded  (2-way Anova, no effect of time p = 0.9, F = 0.19, effect of group p < 0.005, F = 11.54). (g). Similar to panel b for week 2. (2-way Anova, light phase: effect of time p < 0.005, F = 9.83, no effect of group p = 0.06, F = 3.5; dark phase no effect of time p = 0.8, F = 0.29, effect of group p < 0.005, F = 13.7). (h). Similar to panel c for week 2. (2-way Anova, effect of time p < 0.005, F = 34.42, effect of group p = 0.008, F = 7.32). (i). Similar to panel d for week 2. (2-way Anova, no effect of time p = 0.8, F = 0.29, no effect of group p = 0.07, F = 3.22). (l). Similar to panel e for week 2. (2-way Anova, effect of time p < 0.005, F = 71.4, no effect of group p = 0.08, F = 3.1). www.nature.com/scientificreports/ trials and accurately performed trials. The efficiency of TAAR5-KO mice could depend on two factors. One is that they make fewer mistakes and potentially learn faster, suggesting that they are more efficient and engaged in the task. The other factor is task-related and assumes that TAAR5-KO mice are more accurate in temporal decision-making tasks. We explored here both possibilities.
TAAR5-KO mice make fewer mistakes and have a higher rate of improvement. We hypothesised that TAAR5-KO mice were performing better due to a general non-task-related cognitive ability to learn better and remain engaged in the task for a longer time. Therefore, we first evaluated the error trials to see where these mice were performing better. Then, we examined the learning phase and rate of improvement to investigate whether they were learning earlier or were consistently improving more than WT. Indeed, TAAR5-KO mice made fewer errors over both training days and the circadian cycle (Fig. 2, Supplementary Fig. S3 a, b, e, f). To further explore the type of errors most frequently made by WT mice, we separated the analysis into time-out and timing error trials. Time-out trials ended without any response, as the maximum www.nature.com/scientificreports/ time window allowed to nose-poke in one of the locations to make a choice elapsed without response. Timing error trials are instead those in which the animal made the incorrect choice. We found fewer time-out trials over the circadian cycle ( Fig. 2 b, f) and over the two weeks of training ( Fig. 2a, e) for TAAR5-KO mice, suggesting that these mice are overall more engaged in the task. Indeed, KO mice complete their trials by making choices much more often than WT, and this is also true during the light phase when animals are sleepier. However, the reduced number of time-out trials might also suggest impulsivity in KO mice. To exclude this hypothesis, we analysed the intertrial interval (ITI). If TAAR5-KO mice were more impulsive, we should expect shorter ITI. Instead, we observed no difference between WT and KO mice in the average ITI duration on a daily ( Fig. 2 i,j) and hourly ( Fig. 2 k,l) basis. The timing error trials were significantly different between groups over days of training and the circadian cycle ( Fig. 2c, d, g, h, Supplementary Fig. S3d, h). In particular, TAAR5-KO mice were more accurate (less timing error trials) than WT, especially during the dark phase, when animals are more active (Fig. 2d, h).
To further investigate when and how the improvement in the performance emerged in TAAR5-KO mice, we analyzed the learning during the first week of training to identify its occurrence. Multiple factors could determine the higher performance over time: either an earlier and better learning or a better rate of improvement over time or a combination of both. We defined learning as the time point (or trial) at which the cumulative number of correct trials ( Supplementary Fig. S3i) showed the maximum inflexion (Fig. 3a, see Methods). We found that both groups learned early in training; KO mice within the first day and all WT within the second day (Fig. 3b), with no significant difference between groups. They also learned at the beginning of the dark phase, when mice become more active (Fig. 3b, right panel). The learning trial and time of learning were not significantly different between groups. Neither was the learning rate, defined as the change in the slopes of the regressions between before and after the learning point (Fig. 3c, see Methods), despite a marginal trend suggesting better improvement by TAAR5-KO mice (Kolmogorov-Smirnov test, p = 0.07). www.nature.com/scientificreports/ The above definition of learning is somewhat arbitrary. Therefore, we tested our hypothesis with an alternative definition of learning as the trial after which the animal performed at or above 80% correct for at least 20 consecutive trials. We found similar results in terms of day of learning (Supplementary Fig. S3k) and a significant difference in terms of time of learning (Supplementray Fig. S3l), with KO-mice learning at the beginning of the dark phase compared to WT.
Since TAAR5-KO mice learn as well as WT and at the same time, then we hypothesised that TAAR5-KO mice might show a higher rate of improvement than WT over time. Therefore, we computed the correct rate (see Methods) per hour over the training (Supplementary Figure S3j). We observed a clear circadian effect resulting in a steeper correct rate during the dark phases compared to the light phases. We compared the average correct rates around the time the light is turned off (hour 0 in Fig. 3d) over days, and we observed an increasingly steep correct rate for both groups over days. However, TAAR5 KO mice showed to get close to the optimal correct rate (the diagonal dashed line, Fig. 3d) faster and earlier. To further compare the distributions of correct rates across subjects and over time, we quantified the slope of the cumulative correct rate curve after the light switch for each subject in each day of training. We found a significant effect of group and time, suggesting that TAAR5 mice reached a better performance sooner and their rate of improvement was also higher. These results support the hypothesis that the performance improvement is not task-related but instead is due to a higher engagement (lower time-out trials) and better cognitive flexibility (higher rate of improvement). However, the significantly lower timing error rate compared to the time-out rate ( Supplementary Fig. S3c,d,g,h) did not exclude that TAAR5-KO mice might show significant alteration in timing parameters. We explored this option below.
Interval-timing is preserved in TAAR5-KO mice. To fully explore the efficiency of KO mice in this behavioural paradigm, we checked the accuracy in task-related parameters. In particular, probe trials were introduced during the second week of training to investigate the persistence in nose-poke activity around the target time when the reward was not delivered. Two possibilities of poking behaviour are feasible. Prolonged poking might suggest perseveration in reward-seeking, whereas reduced poking activity suggests inactivity, which we excluded ( Supplementary Fig. S1) or inattention. To explore the former, we evaluated the distribution of nosepokes during probe trials for short (Fig. 4a,b) and long duration (Fig. 4b) trials. No difference in the distributions of nose-poke activity between TAAR5 KO and WT mice (Fig. 4c) for either short or long probes has been found, suggesting that TAAR5 KO mice have intact interval-timing estimation.
TAAR5-KO mice have optimal temporal accuracy. Relevant task-related parameters of the switch task include estimating temporal accuracy and uncertainty (see Methods). Typically, control animals develop an optimal strategy to solve this task moving to the short location soon after self-initiating the trial and waiting until the short time elapses. If the light signal is short, the animal pokes in the short location; otherwise, it switches to the long location, waiting for the long duration to elapse. The time of the switch from short to long location is called 'switch latency' . By knowing the distribution of the switch latencies, we can estimate how close the behaviour is to an optimal one 33 (see Methods). For each subject, we assessed the switch latency distribution parameters from the fitted normal distribution 41 . The average switch latency reflects the subject's target switch latencies, also called timing accuracy (μ), while the dispersion around the mean (the coefficient of variation, CV = σ/μ) reflects the endogenous timing uncertainty. We estimated the accuracy and uncertainty of every subject during the first (Fig. 5 top panels) and second (Fig. 5 bottom panels) week of training. Both groups were performing nearly perfect (Fig. 5a, d) since no difference in the distributions of the parameters between TAAR5-KO and WT mice (Fig. 5b, c, e, f) has been found. These results suggest that TAAR5-KO mice performance is not related to the specific task demands but instead is a general feature of this mouse line, likely to show consistent performance improvement across a wide range of behavioural and cognitive tasks.

Discussion
TAAR5 expression in the CNS was demonstrated in recent studies, and some of its putative functions in brain physiology were characterised [13][14][15]34 . TAAR5 is involved in regulating emotional behaviour, and TAAR5-KO mice show an anxiolytic and anti-depressant-like phenotype 15 . In this study, we evaluated the role of TAAR5 on cognitive processes and we showed that TAAR5-KO mice were able to perform better by making fewer errors and displaying a higher rate of improvement in the performance. In particular, we found that TAAR5-KO mice were more engaged (lower time-out trials) in the task, especially during the light phase, where mice are typically sleepier, and were more accurate (lower timing error) especially during the dark phase. This improvement in the performance was not due to earlier learning, but instead, we found a constant higher rate of improvement throughout the training. Apart from its role in olfaction, the comprehension of TAAR5 functions in brain physiology is still in its infancy. Since its low expression, initial reports did not find TAAR5 outside the olfactory epithelium 12 . However, independent reports show a discrete TAAR5 expression in several brain areas using different techniques, demonstrating its presence in limbic regions such as the amygdala, entorhinal cortex, nucleus accumbens, thalamic and hypothalamic nuclei [13][14][15]17 . Recently, by analysing transcriptomic datasets derived from human samples, it was found a low but ubiquitous expression of TAAR5 in limbic and cortical areas 13,14 . A similar situation was initially in TAAR1 studies since it was found at low levels in discrete brain regions 3,35 . However, TAAR1-KO mice display a clear phenotype and many reports demonstrated its relevant role in dopamine, serotonin and glutamate homeostasis 8,9 . TAAR1 selective agonists are now in late-stage clinical trials with the indication of potential antipsychotic agents 36 .
TAAR5-KO mice did not show gross abnormalities nor overt neurological phenotype 15 . However, a series of behavioural tests assessing emotional behaviour highlighted that TAAR5-KO mice are less anxious and with an antidepressant-like phenotype compared to WT littermates 15 . Serotonin and its metabolites levels are also altered in this mouse line and the hypothermic effect of the 5-HT1A agonist 8-OH-DPAT is increased. Another report shows that striatal dopamine levels and the number of dopamine neurons in the substantia nigra is increased and, interestingly, the neurogenesis in the subventricular and subgranular zones is increased in mutants 37 . Recently, an altered sensorimotor function in TAAR5-KO mice was also demonstrated 31 . Some of these previously observed alterations might suggest that changes in performance could occur due to an increase eager to reward rather than enhanced cognitive flexibility. However, we showed that WT and KO mice performed a comparable number of trials (Fig. 1d,e,i,l), number of rewarded trials ( Supplementary Fig. S2a,d) and had a similar distribution of reaction times (Supplementary Fig. S3m, Methods). These results exclude a motivational state to reward as a potential contributor to the changes observed. www.nature.com/scientificreports/ To unveil the possible roles of TAAR5 in cognition, we used a home cage behavioural paradigm that tested temporal decision-making in mice. Home cage behavioural test allows to collect large amount of data and to reveal subtle differences between substrain of animals 32 . In this study, this paradigm highlighted several interesting aspects in the analysis of cumulative correct rate (Fig. 3d) over the days of training at the time the light switched off. First, both genotypes show a clear step-change in the average hourly activity between the light and dark phases. Second, before the light switch off (before 0 in Fig. 3d) KO and WT mice have a 50% or lower probability of success, respectively, over the days of training, with no effect of time. This phenomenon suggests a sleepiness effect unrelated to the level of training. Third, as soon as the light switch off (after trial 0 in Fig. 3d), we can observe a clear improvement in the performance, which increases over the training days. Finally, we showed that TAAR5-KO mice had, on average, a better rate of improvement across all training days (Fig. 3e).
In this test, animals have to learn the task to obtain a food pellet, in particular, to discriminate between two-time intervals. Both WT and TAAR5-KO learned quite fast the test (Fig. 3). The speed in learning is due to the continuous exposure to the task (24/7), forcing the animal to work to obtain food in their home cage. This removes the stress caused by moving the animal from one cage to another and allows mice to engage in the task at their own rhythm, which typically follows a circadian oscillation through the 24 h.
Interestingly, KO mice performed better in the test, indicating an increased accuracy in the decision-making process visible from the lower timing error trials (Fig. 2). The better performance was also evident in the light phase, a period of the day where animals are usually sleepier and make more errors. Indeed, WT mice showed higher time-out trials. These trials occur when the animal self-initiates the trial but is not keen to complete it. The elevated number of time-out trials during the light phase suggests that these trials do not happen due to a momentary inattention to the task but are more likely due to disengagement and sleepiness.
Another interesting distinction between time-out trials and timing error trials is their temporal evolution. In particular, the latter reflects the learning dynamic; the former is constant over the entire training. Figure 2c shows the decrease in timing error trials over the first week of training, which is then maintained constant throughout the second week (Fig. 2g). This dynamic nicely resembles the improvement and maintenance of the performance seen in Fig. 1a, f. On the contrary, time-out trials remain constant over the entire training, supporting our previous claim that these trials are a reflection of sleepiness.
If we look at task-related parameters, both groups of mice displayed a correct interval-timing estimation and an optimal combination of temporal accuracy and uncertainty. Overall, our results suggest that TAAR5-KO mice are better learners, more engaged in the task, and adapt more flexibly to change in the environment. This overall better performance is not related to the specific task demand but it may be a general feature of these mice.
To confirm these data, more behavioural tests specific to each cognitive domain are needed to understand the precise role of TAAR5 in cognition. It should be noted that these subtle differences may be difficult to unveil using standard tests done during a few hours in the daylight phase. Cognitive assays usually reveal more easily deficits rather than a pro-cognitive effect in WT animals. Another option would be to inject a TAAR5 antagonist to mimic these actions in vivo, similarly to what has been done for TAAR1 studies. Although the first TAAR5 antagonists were described some years ago 38 , no other selective and potent compounds have been reported so far. In principle, a selective TAAR5 antagonist may be a new potential drug with several therapeutic indications with a new mechanism of action. Apart from the endogenous agonist TMA that has a clear role in olfaction, only another putative agonist has been found and tested in animal models, namely α-NETA. Interestingly, in mice and rats, this compound was able to induce psychotic-like behavioural abnormalities, including features related to cognitive deficits present in psychotic patients 20,21 .
How TAAR5 influences cognitive domains is still under investigation. A recent report show that TAAR5-KO mice have an increased number of dopaminergic neurons in SNpc and an increased levels of dopamine in the striatum 37 . The correct levels of dopamine and dopamine signaling is fundamental for correct cognitive functions. Altered levels of dopamine in pathological conditions, such as in Parkinson's Disease and Schizophrenia lead to cognitive deficits 39 . Thus, further studies are needed to understand the importance of dopamine alterations in TAAR5-KO mice in the cognitive alterations seen in our behavioral paradigm. Serotonin, a neurotransmitter whose levels are altered in TAAR5-KO mice, plays an interesting role in cognition 23,24 . Although the serotonergic system is very complex and serotonergic receptors are a big family of GPCRs with a myriad of functions, there is a general consensus that serotonin is important in decision making. Moreover, compounds that increase serotonergic transmission, such as antidepressants, are used in neuropsychiatric diseases where cognitive impairments are present and in particular impairment in decision making 23 . 5-HT1A agonists, especially ones acting mostly on post-synaptic receptors, increase behavioural flexibility and facilitate performances 40 . Similarly, anxiolytics may be beneficial in decision making, particularly when the decision is influenced by an emotional component 26 . An anxiolytic effect may also facilitate a flexible choice behaviour, increasing the speed of finding the optimal strategy. TAAR5-KO mice display a decrease total content of serotonin in the striatum and the hippocampus 15 . However, there are no data on the extracellular serotonin levels or the elecrophysiological properties of the serotonergic neurons in these mice. A recent study showed that TAAR5-KO mice had increased adult neurogenesis in both the subventricular zone and the subgranular zone 37 . Adult neurogenesis is linked to many aspects of brain physiology, including cognition and the behavioural effect of stress and antidepressant 41 . In particular, several pieces of evidence suggest a role of adult neurogenesis in cognitive flexibility and that this action may reduce anxiety and depressive-like behaviour 41 .
In conclusion, we showed that TAAR5 might be considered a new player in cognitive processes and a potential new drug target for various neuropsychiatric disorders involving deficiencies in emotional states and cognition.

Materials and methods
Mice and husbandry. Groups of 8-12 weeks old male mice were studied (TAAR5-KO and their WT littermates). Each group included 6 mice and were generated as described previously 15 . All mice were group-housed two weeks before the experiment with food and water ad libitum under a 12:12 light-dark cycle (lights on from 7:00 to 19:00). The week before the experiment, 20 mg food pellets were gradually mixed into regular food for habituation. Then mice were singly housed in type III TSE PhenoMaster cages (TSE Systems Bad Homburg, Germany) and subjected to the experimental phases. The animal study was reviewed and approved by all procedures involving animals. Their care was carried out in accordance with the guidelines established by the European Community Council (Directive 2010/63/EU of September 22, 2010) and was approved by the Italian Ministry of Health. During the experimental phases, animal wellbeing was monitored daily. If the weight loss was between 10 and 20% (referring to the free-feeding weight taken on day one), one or two additional standard food pellets (approx. 1.3 g) were given, respectively. If weight loss exceeded 20% of the free-feeding weight, animals had to be culled. All the animals in this study completed the experiment.
Apparatus and procedure. In this study, we used an automated operant wall (Cognition and Welfare, COWE), developed by TSE Systems (Germany) based on its PhenoMaster System. The device consists of three holes/hoppers over a metal wall inserted in type III cages. Each hole is equipped with infrared beams that detect the nose poking. A LED with 4 mcd (millicandela) of luminous intensity is mounted in each hopper to serve as a stimulus. The two lateral hoppers are attached to independent hidden feeders that dispense 20 mg dustless precision pellets (BioServ, USA). The sensors (LED and infrared beams) and the actuator (feeder) were remotely controlled via computer to design trial by-trial protocols for individual and/or group cages. Each COWE cage (n = 12) was maintained in individual ventilated and sound-proof light-controlled cubicles. The house light (approximately 100-110 lx) was timed on a 12:12 light-dark schedule as described above.
Experimental design. The whole experiment consisted of two experimental phases following a pre-training phase. The experiment included 12 animals. During the pre-training phase, all mice familiarised themselves with the COWE cage to obtain food pellets from both lateral hoppers. This pre-training phase consisted of selfinitiating trials by nose-poking in the central hopper triggering the switch-on of the lights in the three hoppers. Nose-poking in the lateral hoppers gave access to food rewards. No temporal limitations were imposed during this phase, and the goal of the pre-training phase was to develop the association between hopper location and food pellet. The trial ended when the animal received a pellet from each side and concomitantly lights switched off. Each trial was followed by an intertrial interval (ITI). The ITI was set as a 30 s fixed delay plus a random interval drawn from a geometric distribution with a mean of 60 s. The mice could not initiate a new trial during the ITI. After four days of pre-training, all mice were introduced to the two consecutive experimental phases, each lasting about a week. During the first week, mice were trained in the switch task 32 . In this task, animals had to discriminate the duration of two light signals (i.e., short-vs. long-latency signals, called here short and long trials) to obtain a food pellet in a trial. The duration of the light signal determined the location of the pellet availability. Short (T S ) and long (T L ) trials were randomly intermixed with the same probability (P(T S ) = P(T L ) = 0.5).
The left hopper was associated with the short trials, whereas the right hopper was associated with the long trials. The first nose poke after the short or long duration at the corresponding lateral hopper was reinforced with a food pellet, the trial was declared finished and the ITI started. Any nose-poke at the long location after a short signal or vice-versa was not reinforced, triggering the start of the ITI. These trials were classified as timing error trials. If the animal self-initiated the trial but did not engage in the task by not poking in the hoppers, the trial ended after 30 s with no reward. These trials were classified as time-out trials. Short-latency signals lasted 3 s and long-latency signals lasted 9 s.
In the second week of training, we introduced 20% probe trials for both short-latency and long-latency trials. This means that short probes (Sp) and long probes (Lp) were introduced with the same conditional probabilities P(Sp|T S ) = P(Lp|T L ) = 0.2. During probe trials, the signal was presented as for regular trials but the correct responses of the animals were not reinforced. Probe trials lasted 30 s each, followed by an ITI as described earlier in this section. This manipulation allowed us to further characterise mouse timed behaviour in its full complexity.

Data analysis.
We recorded all events in the COWE cages with a millisecond resolution and these events were timestamped. Each timestamp was paired with an event code that identifies a specific type of event (i.e., light on/off, nose in/out, etc.). This strategy allowed us to standardise specific codes for data analysis across laboratories 32 . Every analysis was performed using MATLAB (www. mathw orks. it) software.
All the analyses were performed on the first and second weeks of training separately to highlight the impact of probe trials. The analyses along the circadian cycles were computed for each subject by quantifying the parameter (e.g. performance, time-out trials, etc.) for each hour and then averaged across three-hour intervals. From the resulting dataset, we computed a group average. The performance for each mouse was computed as the count of correct trials over the total number of trials performed. Correct trials included all rewarded trials during the first week of training; however, during the second week of training, correct trials included also probe trials.
We computed the cumulative number of correct trials to identify the learning point for each subject. This curve is the cumulative sum of correct (+ 1) and error (-1) trials over time. To identify the learning trial, we fitted a piecewise linear regression model to each trial allowing a minimum of five trials from the beginning of training. The model was applied to a moving window of length twenty trials, moving every five trials. The model comprised a robust regression line fitted to the cumulative curve before each trial and another line fitted to the cumulative curve after the trial. The learning trial was then identified as the first trial having the maximum www.nature.com/scientificreports/ increase in slope between the fitted regression before and after the trial. The day of learning and time of learning was reconstructed from the timestamps of the identified learning trial. The learning rate is the difference between the slope of the regression line before and after the learning trial.
To quantify the rate of improvement across days during the transition between light and dark phase, we quantified the cumulative correct rate curve for each subject. This curve looked at an interval of 22 h around the time of the switch-off of lights (10 h before and 12 h after the switch-off of the house light). For each hour, each subject and each day, we computed the rate of correct trials.
To assess the circadian rhythmicity, we quantified the circadian period with a non-linear curve-fitting to the number of nose-pokes over the recording hour. The periodic function that was fit to these data is defined by Eq. (1).
where T = (t1, . . . , tn) are the time points (15 min) during the recording and K = (A, P, Φ) are, respectively the amplitude, period, and phase of the sinusoidal function. Best-fit coefficients (A, P, Φ) were determined by minimising the mean-square difference between F and the data. The fit was repeated for multiple values of the parameter P (from 21 to 27 h every 0.5 h). The goodness of the fit was quantified by the Pearson Correlation Coefficient (CC) between the data of each subject and the corresponding fit function F. The best fit for P converged to the Subjective Period ( Supplementary Fig. 1b) for every initialisation of the parameter P between 22 and 26 h for every subject.
Probe trials were introduced to assess how timing behaviour changed when correct responses were reinforced probabilistically (note that these probabilities were equal between the two trial types). The raster plot of NP activity (Fig. 4a) for each subject and each trial type was analysed. We showed the empirical normalised NP distribution for an example subject (Fig. 4b) and compared the cumulative distribution of NP between groups (Fig. 4c).
We assessed the temporal decision-making performance by analysing the switch latency or accuracy (Fig. 5). The switch-latency is defined only for long trials and is the trial time at which the mouse leaves the short-latency location for the long-latency location 42 . For each subject, the distribution of the switch latencies was fit with a Gaussian function. From each Gaussian fit, we estimated the mean (μ) and variance (σ). We considered the μ as the accuracy in timing estimation and the coefficient of variation (CV = μ/ σ), which is the dispersion of switch latencies around the mean, as the timing uncertainty 43 . The dependence of optimal target-switch latency on the level of timing uncertainty was formulated in 42 and then expanded in 32 . This formulation, called the Normalized Expected Gain function (Eq. 2), allows the evaluation of timed behaviours within the framework of optimality based on experienced probabilistic reinforcement, endogenous timing uncertainty, and the payoff matrix.
The optimal target switch latency for a given mouse was defined as the µ that maximises the output of Eq. 2, where Φ is the normal cumulative distribution function. We computed the reaction time for each trial as the time elapsed between the end of the short or long signal and the first nose-poke to the left or right hopper, respectively. For each subject, we estimated the empirical probability distribution function (PDF in Supplementary Fig. 3 m) and compared the mean ± SEM between the two groups.
Statistical analysis. Data were analysed with a one-way or repeated measure ANOVA test using the Matlab package. The significant difference and F-statistic, the ratio of the mean squares, are reported for every test in the Figure legends.
Ethical approval. The animal study was reviewed and approved by all procedures involving animals and their care were carried out in accordance with the guidelines established by the European Community Council (Directive 2010/63/EU of September 22, 2010) and were approved by the Italian Ministry of Health. The study is reported in accordance with ARRIVE guidelines.

Data availability
A GitHub repository containing the data and the Matlab codes to reproduce the main findings of this paper is available at https:// github. com/ siban gi/ TAAR5_ repo.